VCF

Row

Filter

Species Number of variants (raw) Number of SNPs (raw) Number of indels (raw) Number of MNPs (raw) SNPs (after removing 5bps around indels, %) Filter QD<10 (%) Filter QUAL<30 (%) Filter AVGGQ<20 (%) Filter AVGDP<10 (%) Upper depth cutoff Filter AVGDPcutoff Filter MQ<40 (%) Filter FS<60 (%) Filter SOR<3 (%) Filter MQRankSum Filter ReadPosRankSum Number SNPs pass SNPs pass (% from indel5bp) SNPs pass (% from raw)
Aboye 78.708 M 64.269 M 16.31 M 11.651 M 60.3747 8.0715 0 11.2172 32.4490 42 0.6652 12.2994 0.3451 4.0712 0.0015 0.0002 28.504 M 59.9825 44.3503
Peryt 62.125 M 50.705 M 13.109 M 10.981 M 56.8140 9.6620 0 4.2660 10.1566 45 0.5506 5.8098 0.2458 2.0874 0.0014 0.0002 28.007 M 79.3502 55.2347
Cjuli 52.346 M 43.361 M 10.193 M 7.586 M 63.1596 8.6502 0 8.5288 21.1054 40 0.5890 8.7079 0.4691 2.7667 0.0007 0.0001 23.362 M 70.6602 53.8773
Dpunt 53.097 M 44.481 M 9.459 M 5.827 M 67.0303 8.0205 0 7.9463 15.8367 34 0.5573 9.8264 0.2480 3.3900 0.0005 0.0001 26.624 M 74.8044 59.8537
Gnige 54.858 M 43.486 M 13.058 M 11.351 M 51.6751 9.3795 0 9.7425 21.4863 48 1.0416 4.9387 1.7740 2.2622 0.0022 0.0003 19.423 M 68.5179 44.6659
Scabr 41.577 M 32.814 M 9.882 M 7.614 M 55.7784 7.4074 0 1.3659 3.4610 51 0.5945 1.8352 0.2611 1.0953 0.0022 0.0003 20.524 M 88.5020 62.5475
Spilc 53.509 M 39.236 M 16.384 M 12.063 M 48.1100 13.9271 0 13.2966 26.1786 51 0.5448 10.7754 0.2210 4.4008 0.0008 0.0003 15.543 M 60.3771 39.6142
Lmorm 29.992 M 24.632 M 5.918 M 3.757 M 67.2032 7.1111 0 4.0199 9.5812 55 0.5193 6.8731 0.4881 2.7649 0.0029 0.0003 16.482 M 81.7762 66.9149
Msurm 32.532 M 26.455 M 7.078 M 5.708 M 60.3445 7.6958 0 2.1531 4.6326 64 0.6013 3.5797 0.3723 2.0339 0.0030 0.0005 16.876 M 85.9666 63.7936
Cgale 28.08 M 23.266 M 5.481 M 3.644 M 68.5133 9.1767 0 9.2604 16.6887 62 0.7612 12.7323 0.6687 3.8219 0.0037 0.0007 14.102 M 73.2985 60.6105
Ssard 28.459 M 22.286 M 6.819 M 4.256 M 61.0671 6.6099 0 2.5047 6.4139 53 0.3512 2.7936 0.3158 1.6742 0.0012 0.0002 14.979 M 86.1882 67.2123
Scant 24.817 M 19.797 M 5.56 M 3.391 M 64.7264 10.4272 0 3.2330 6.7445 48 0.9775 5.4170 0.6916 2.4746 0.0025 0.0003 13.127 M 81.7206 66.3072
Lbude 22.488 M 18.594 M 4.1 M 1.436 M 73.3165 3.6672 0 1.2677 1.4681 62 0.6664 2.3091 0.2487 1.1356 0.0018 0.0003 15.419 M 93.5234 82.9260
Scine 21.28 M 16.382 M 5.467 M 3.512 M 59.4479 8.6106 0 3.8303 8.7132 71 0.8342 7.8203 1.0086 2.5603 0.0061 0.0006 10.209 M 80.7016 62.3201
Mmerl 18.857 M 13.227 M 6.437 M 4.433 M 48.7834 17.3570 0 7.9431 13.5601 72 1.3609 11.2877 1.4988 4.5794 0.0055 0.0009 6.099 M 66.2996 46.1110
Dlabr 13.366 M 10.586 M 3.001 M 1.541 M 67.5807 12.1566 0 2.5158 4.5965 49 1.3273 4.7543 0.5625 2.8535 0.0018 0.0004 7.427 M 82.2231 70.1604
Styph 14.813 M 11.897 M 3.303 M 2.235 M 62.5797 5.0903 0 1.0889 2.1179 118 0.5246 3.9792 0.7381 1.9171 0.0136 0.0014 8.425 M 90.8816 70.8123
Afall 15.839 M 10.419 M 5.767 M 2.786 M 51.4755 16.4986 0 8.7760 15.1129 62 0.9240 8.2993 0.9779 5.0038 0.0025 0.0012 5.491 M 67.3449 52.6973
Hgutt 4.176 M 3.008 M 1.254 M 0.6 M 61.1707 8.1805 0 2.2114 3.0368 108 0.2421 2.3358 1.1716 2.9105 0.0073 0.0009 2.243 M 87.7844 74.5633

Species

Species Number of SNPs Transitions (Ts) Transversions (Tv) Ts/Tv
Atherina boyeri 28.504 M 15283372 13220130 1.16
Alosa fallax 5.491 M 3074108 2416542 1.27
Coryphoblennius galerita 14.102 M 8008818 6092833 1.31
Coris julis 23.362 M 13321677 10039823 1.33
Dicentrarchus labrax 7.427 M 3989194 3438013 1.16
Diplodus puntazzo 26.624 M 16043571 10580196 1.52
Gobius niger 19.423 M 10990524 8432971 1.30
Hippocampus guttulatus 2.243 M 1272317 970318 1.31
Lophius budegassa 15.419 M 7771877 7647453 1.02
Lithognathus mormyrus 16.482 M 9505578 6976802 1.36
Merluccius merluccius 6.099 M 3692784 2406158 1.53
Mullus surmuletus 16.876 M 9734752 7141627 1.36
Pagellus erythrinus 28.007 M 16440864 11566160 1.42
Serranus cabrilla 20.524 M 11104907 9419531 1.18
Spondyliosoma cantharus 13.127 M 8249303 4877674 1.69
Symphodus cinereus 10.209 M 5921071 4288108 1.38
Sardina pilchardus 15.543 M 9119002 6424066 1.42
Sarda sarda 14.979 M 8751501 6227148 1.41
Syngnathus typhle 8.425 M 4935044 3489717 1.41

Substitution

Table

Depth

Number of SNPS

SNPS same as reference

Missing genotype

Phased genotype percentage

Phased genotype number

Number of genotype

SNPs Ts/Tv

SNPs Het/Hom

Depth distribution

Quality distribution

Packages used

          stringr           viridis       viridisLite           R.utils              R.oo       R.methodsS3 
          "1.4.1"           "0.6.2"           "0.4.1"          "2.11.0"          "1.24.0"           "1.8.1" 
       kableExtra       formattable               png            readxl            plotly             dplyr 
          "1.3.4"           "0.2.1"           "0.1-7"           "1.3.1"          "4.10.0"          "1.0.10" 
               DT           leaflet              ggsn           ggplot2 rnaturalearthdata     rnaturalearth 
           "0.20"           "2.1.1"           "0.5.0"           "3.4.0"           "0.1.0"           "0.1.0" 
               sf      RColorBrewer         rmarkdown     flexdashboard 
          "1.0-2"           "1.1-3"            "2.11"           "0.6.0" 
---
title: "Variant Calling"
author: "Pierre Barry"
date: "`r format(Sys.time(), '%d %B, %Y, %H:%M')`"
output: 
  flexdashboard::flex_dashboard:
    theme: paper
    orientation: rows
    social: menu
    source_code: embed
    vertical_layout: scroll
---

```{r global, include=FALSE}
list.of.packages <- c("RColorBrewer","stringr","kableExtra","formattable","ggplot2","plotly","DT")

for (i in list.of.packages){
  if (i %in% installed.packages()[,"Package"] == FALSE){
    install.packages(i)
  }
  eval(bquote(library(.(i))))
}
```

```{r, include = FALSE}
color_med_atl=data.frame(Location=c("Gulf of Lion","Costa Calida","Algarve","Bay of Biscay"),
                         Col=brewer.pal(n = 4, name = "RdBu"))
```


VCF {data-icon="ion-ios-settings"}
=======================================================================

```{r}
load(file=paste(path,"/output/variant_calling/VCF_SPECIES.Rdata",sep=""))
load(file=paste(path,"/output/variant_calling/VCF_STATS_INDIV.Rdata",sep=""))
load(file=paste(path,"/output/variant_calling/VCF_DEPTH_QUALITY_PER_SITE.Rdata",sep=""))
LOCATION=c()
for (i in 1:nrow(VCF_STATS_INDIV)){
  LOCATION[i]=location=str_sub(VCF_STATS_INDIV$SAMPLES[i],start=6,end=7)
}
VCF_STATS_INDIV$LOCATION=LOCATION

for (j in 1:nrow(VCF_SPECIES[[3]])){
  VCF_SPECIES[[3]][j,]$COUNT=VCF_SPECIES[[3]][j,]$COUNT/
    (VCF_SPECIES[[1]]$NBR_SNPS[which(VCF_SPECIES[[1]]$SPECIES==VCF_SPECIES[[3]]$SPECIES[j])])
}

labels_species=c()

for (sp in 1:length(levels(factor(VCF_STATS_INDIV$SPECIES)))){
  
  labels_species=c(labels_species,
                   paste(levels(factor(VCF_STATS_INDIV$SPECIES))[sp]," \n n = ",table(VCF_STATS_INDIV$SPECIES)[sp],sep=" "))
  
}

labels_species=as.vector(labels_species)

VCF_SPECIES$LOCATION=factor(VCF_SPECIES$LOCATION,levels=c("Li","Mu","Fa","Ga"))
VCF_STATS_INDIV$LOCATION=factor(VCF_STATS_INDIV$LOCATION,levels=c("Li","Mu","Fa","Ga"))
```

Row {.tabset}
-----------------------------------------------------------------------

### Filter 

```{r}
filter_variant = read.csv(paste(path,"/output/variant_calling/variant_filter.csv",sep=""))
filter_variant = filter_variant[,-1]

filter_variant$PERCE_NBR_SNPS_PASS_FROMINDEL5BP <- round(filter_variant$NBR_SNPS_PASS*100 / filter_variant$NBR_SNPS_AFTER_INDEL5BP,4)
filter_variant$PERCE_NBR_SNPS_PASS_FROMINDEL5BP <- color_bar("lightgreen")(filter_variant$PERCE_NBR_SNPS_PASS_FROMINDEL5BP)

filter_variant$PERCE_NBR_SNPS_PASS_FROMSNPRAW <- round(filter_variant$NBR_SNPS_PASS*100 / filter_variant$NBR_SNPS_RAW,4)
filter_variant$PERCE_NBR_SNPS_PASS_FROMSNPRAW <- color_bar("lightgreen")(filter_variant$PERCE_NBR_SNPS_PASS_FROMSNPRAW)

filter_variant$NBR_SNPS_PASS <- color_bar("lightgreen")(filter_variant$NBR_SNPS_PASS)

filter_variant$NBR_SNPS_FILTER_ReadPosRankSumm8 <- round(filter_variant$NBR_SNPS_FILTER_ReadPosRankSumm8*100 / filter_variant$NBR_SNPS_AFTER_INDEL5BP,4)
filter_variant$NBR_SNPS_FILTER_ReadPosRankSumm8 <- color_bar("lightgreen")(filter_variant$NBR_SNPS_FILTER_ReadPosRankSumm8)

filter_variant$NBR_SNPS_FILTER_MQRanksSumm12p5 <- round(filter_variant$NBR_SNPS_FILTER_MQRanksSumm12p5*100 / filter_variant$NBR_SNPS_AFTER_INDEL5BP,4)
filter_variant$NBR_SNPS_FILTER_MQRanksSumm12p5 <- color_bar("lightgreen")(filter_variant$NBR_SNPS_FILTER_MQRanksSumm12p5)

filter_variant$NBR_SNPS_FILTER_SOR3 <- round(filter_variant$NBR_SNPS_FILTER_SOR3*100 / filter_variant$NBR_SNPS_AFTER_INDEL5BP,4)
filter_variant$NBR_SNPS_FILTER_SOR3 <- color_bar("lightgreen")(filter_variant$NBR_SNPS_FILTER_SOR3)

filter_variant$NBR_SNPS_FILTER_FS60 <- round(filter_variant$NBR_SNPS_FILTER_FS60*100 / filter_variant$NBR_SNPS_AFTER_INDEL5BP,4)
filter_variant$NBR_SNPS_FILTER_FS60 <- color_bar("lightgreen")(filter_variant$NBR_SNPS_FILTER_FS60)

filter_variant$NBR_SNPS_FILTER_MQ40 <- round(filter_variant$NBR_SNPS_FILTER_MQ40*100 / filter_variant$NBR_SNPS_AFTER_INDEL5BP,4)
filter_variant$NBR_SNPS_FILTER_MQ40 <- color_bar("lightgreen")(filter_variant$NBR_SNPS_FILTER_MQ40)

filter_variant$NBR_SNPS_FILTER_AVGDPcutoff <- round(filter_variant$NBR_SNPS_FILTER_AVGDPcutoff*100 / filter_variant$NBR_SNPS_AFTER_INDEL5BP,4)
filter_variant$NBR_SNPS_FILTER_AVGDPcutoff <- color_bar("lightgreen")(filter_variant$NBR_SNPS_FILTER_AVGDPcutoff)

filter_variant$NBR_SNPS_FILTER_AVGDP10 <- round(filter_variant$NBR_SNPS_FILTER_AVGDP10*100 / filter_variant$NBR_SNPS_AFTER_INDEL5BP,4)
filter_variant$NBR_SNPS_FILTER_AVGDP10 <- color_bar("lightgreen")(filter_variant$NBR_SNPS_FILTER_AVGDP10)

filter_variant$NBR_SNPS_FILTER_AVGGQ20 <- round(filter_variant$NBR_SNPS_FILTER_AVGGQ20*100 / filter_variant$NBR_SNPS_AFTER_INDEL5BP,4)
filter_variant$NBR_SNPS_FILTER_AVGGQ20 <- color_bar("lightgreen")(filter_variant$NBR_SNPS_FILTER_AVGGQ20)

filter_variant$NBR_SNPS_FILTER_QUAL30 <- round(filter_variant$NBR_SNPS_FILTER_QUAL30*100 / filter_variant$NBR_SNPS_AFTER_INDEL5BP,4)
filter_variant$NBR_SNPS_FILTER_QUAL30 <- color_bar("lightgreen")(filter_variant$NBR_SNPS_FILTER_QUAL30)

filter_variant$NBR_SNPS_FILTER_QD10 <- round(filter_variant$NBR_SNPS_FILTER_QD10*100 / filter_variant$NBR_SNPS_AFTER_INDEL5BP,4)
filter_variant$NBR_SNPS_FILTER_QD10 <- color_bar("lightgreen")(filter_variant$NBR_SNPS_FILTER_QD10)

filter_variant$NBR_SNPS_AFTER_INDEL5BP <- round(filter_variant$NBR_SNPS_AFTER_INDEL5BP*100 / filter_variant$NBR_VARIANT_RAW,4)
filter_variant$NBR_SNPS_AFTER_INDEL5BP <- color_bar("lightgreen")(filter_variant$NBR_SNPS_AFTER_INDEL5BP)

filter_variant$NBR_MULTI_RAW <- color_bar("lightgreen")(filter_variant$NBR_MULTI_RAW)
filter_variant$NBR_INDELS_RAW <- color_bar("lightgreen")(filter_variant$NBR_INDELS_RAW)
filter_variant$NBR_SNPS_RAW <- color_bar("lightgreen")(filter_variant$NBR_SNPS_RAW)
filter_variant$NBR_VARIANT_RAW <- color_bar("lightgreen")(filter_variant$NBR_VARIANT_RAW)

for (j in c(2,3,4,5,18)){
  for (i in 1:length(filter_variant[,j])){
  
    filter_variant[i,j] =  paste(strsplit(filter_variant[i,j],">")[[1]][1],
            ">",
            round(as.numeric(strsplit(strsplit(filter_variant[i,j],"</span>")[[1]],">")[[1]][2])/1e6,3),
            " M",
            "</span>",
            sep=""
      )
      
  }
}


kbl(filter_variant,col.names = c("Species",
                           "Number of variants (raw)",
                           "Number of SNPs (raw)",
                           "Number of indels (raw)",
                           "Number of MNPs (raw)",
                           "SNPs (after removing 5bps around indels, %)",
                           "Filter QD<10 (%)",
                           "Filter QUAL<30 (%)",
                           "Filter AVGGQ<20 (%)",
                           "Filter AVGDP<10 (%)",
                           "Upper depth cutoff",
                           "Filter AVGDPcutoff<XX (%)",
                           "Filter MQ<40 (%)",
                           "Filter FS<60 (%)",
                           "Filter SOR<3 (%)",
                           "Filter MQRankSum<-12.5 (%)",
                           "Filter ReadPosRankSum<-8 (%)",
                           "Number SNPs pass",
                           "SNPs pass (% from indel5bp)",
                           "SNPs pass (% from raw)")
    ,escape = F,align=c(rep('c',times=20))) %>%
  kable_classic ("striped","hover",full_width = T,html_font = "Cambria")
```

### Species

```{r}
rownames(VCF_SPECIES[[1]]) = NULL
VCF_SPECIES[[1]]$SPECIES = c("Atherina boyeri",
                             "Alosa fallax",
                             "Coryphoblennius galerita",
                             "Coris julis",
                             "Dicentrarchus labrax",
                             "Diplodus puntazzo",
                             "Gobius niger",
                             "Hippocampus guttulatus",
                             "Lophius budegassa",
                             "Lithognathus mormyrus",
                             "Merluccius merluccius",
                             "Mullus surmuletus",
                             "Pagellus erythrinus",
                             "Serranus cabrilla",
                             "Spondyliosoma cantharus",
                             "Symphodus cinereus",
                             "Sardina pilchardus",
                             "Sarda sarda",
                             "Syngnathus typhle")
VCF_SPECIES[[1]]$SPECIES = cell_spec(VCF_SPECIES[[1]]$SPECIES,italic = T)
VCF_SPECIES[[1]]$NBR_SNPS <- color_bar("lightgreen")(VCF_SPECIES[[1]]$NBR_SNPS)
VCF_SPECIES[[1]]$TS_TV <- color_bar("lightgreen")(VCF_SPECIES[[1]]$TS_TV)

for (i in 1:length(VCF_SPECIES[[1]]$NBR_SNPS)){
  
VCF_SPECIES[[1]]$NBR_SNPS[i] =  paste(strsplit(VCF_SPECIES[[1]]$NBR_SNPS[i],">")[[1]][1],
        ">",
        round(as.numeric(strsplit(strsplit(VCF_SPECIES[[1]]$NBR_SNPS[i],"</span>")[[1]],">")[[1]][2])/1e6,3),
        " M",
        "</span>",
        sep=""
  )
  
}

kbl(VCF_SPECIES[[1]],col.names = c("Species",
                           "Number of SNPs",
                           "Transitions (Ts)",
                           "Transversions (Tv)",
                           "Ts/Tv")
    ,escape = F,align=c(rep('c',times=5))) %>%
  kable_classic ("striped","hover", full_width = T,html_font = "Cambria")

#kable(VCF_SPECIES[[1]], escape = F) %>%
#  kable_paper("hover", full_width = F)
```

### Substitution 

```{r, fig.height=150}
ppp<-ggplot(VCF_SPECIES[[3]],aes(x=TYPE,y=COUNT))+
  facet_grid(SPECIES ~ .)+
  geom_bar(alpha=0.75,stat='identity',aes(fill=TYPE))+
  scale_fill_viridis_d()+
  theme_classic()+
  xlab("Subsitution type")+
  ylab("Count")

ppp<-ggplotly(ppp,height=2500)  %>% partial_bundle() 
ppp
```

### Table {data-height=750}

```{r}
datatable(VCF_STATS_INDIV, 
          class = 'cell-border stripe',
          rownames = FALSE,
          filter = 'top', options = list(
          pageLength = 100,scrollY=500,autowidth=T,scrollX=100))
```

### Depth

```{r}
ppp <- ggplot(VCF_STATS_INDIV, aes(x=SPECIES, y=DEPTH))  +
  geom_violin(fill="chartreuse3",col="black",alpha=0.25) +
  geom_point(aes(col=LOCATION,
                 text=paste("Sample: ",SAMPLES, " \n Depth: ",round(DEPTH,3),sep="")),size = 1,
             shape=19,
             position='jitter',
             alpha=0.75) +
  stat_summary(fun=median, geom="point", size=2, color="orange")+
  scale_x_discrete(name="Species",
                   labels=labels_species)+
    scale_colour_manual(name = "Location",
                    labels = c("Gulf of Lion","Murcia","Faro","Gulf of Gascogne"),
                    values = brewer.pal(n = 4, name = "RdBu"))+
  ylab("Depth")+
  xlab("Species")+
  theme_classic()+
  scale_colour_manual(name = "Location",
                      labels = c("Gulf of Lion","Costa Calida","Algarve","Bay of Biscay"),
                      values = brewer.pal(n = 4, name = "RdBu"))+
  theme(legend.position="bottom")

ppp<-ggplotly(ppp,height=500,tooltip=c("text"))  %>% partial_bundle() 
ppp
```

### Number of SNPS

```{r}
ppp <- ggplot(VCF_STATS_INDIV, aes(x=SPECIES, y=NBR_SNPS))  +
  geom_violin(fill="chartreuse3",col="black",alpha=0.25) +
  geom_point(aes(col=LOCATION,
                 text=paste("Sample: ",SAMPLES, " \n Nbre NPS: ",round(NBR_SNPS,3),sep="")),size = 1,
             shape=19,
             position='jitter',
             alpha=0.75) +
  stat_summary(fun=median, geom="point", size=2, color="orange")+
  scale_x_discrete(name="Species",
                   labels=labels_species)+
  scale_colour_manual(name = "Location",
                    labels = c("Gulf of Lion","Murcia","Faro","Gulf of Gascogne"),
                    values = brewer.pal(n = 4, name = "RdBu"))+
  ylab("Number of SNPS")+
  xlab("Species")+
  theme_classic()+
  scale_colour_manual(name = "Location",
                      labels = c("Gulf of Lion","Costa Calida","Algarve","Bay of Biscay"),
                      values = brewer.pal(n = 4, name = "RdBu"))+
  theme(legend.position="bottom")

ppp<-ggplotly(ppp,height=500,tooltip=c("text"))  %>% partial_bundle() 
ppp
```

### SNPS same as reference

```{r}
ppp <- ggplot(VCF_STATS_INDIV, aes(x=SPECIES, y=PERCENTAGE_SNPS_SAME_AS_REF))  +
  geom_violin(fill="chartreuse3",col="black",alpha=0.25) +
  geom_point(aes(col=LOCATION,
                 text=paste("Sample: ",SAMPLES, " \n Percentage same as reference: ",round(PERCENTAGE_SNPS_SAME_AS_REF,3),sep="")),size = 1,
             shape=19,
             position='jitter',
             alpha=0.75) +
  stat_summary(fun=median, geom="point", size=2, color="orange")+
  scale_x_discrete(name="Species",
                   labels=labels_species)+
    scale_colour_manual(name = "Location",
                    labels = c("Gulf of Lion","Murcia","Faro","Gulf of Gascogne"),
                    values = brewer.pal(n = 4, name = "RdBu"))+
  ylab("Percentage same as reference (%)")+
  xlab("Species")+
  theme_classic()+
  scale_colour_manual(name = "Location",
                      labels = c("Gulf of Lion","Costa Calida","Algarve","Bay of Biscay"),
                      values = brewer.pal(n = 4, name = "RdBu"))+
  theme(legend.position="bottom")

ppp<-ggplotly(ppp,height=500,tooltip=c("text"))  %>% partial_bundle() 
ppp
```

### Missing genotype

```{r}
ppp <- ggplot(VCF_STATS_INDIV, aes(x=SPECIES, y=PERCENTAGE_MISSING_GENOTYPE))  +
  geom_violin(fill="chartreuse3",col="black",alpha=0.25) +
  geom_point(aes(col=LOCATION,
                 text=paste("Sample: ",SAMPLES, " \n Percentage of missing genotype: ",round(PERCENTAGE_MISSING_GENOTYPE,3),sep="")),size = 1,
             shape=19,
             position='jitter',
             alpha=0.75) +
  stat_summary(fun=median, geom="point", size=2, color="orange")+
  scale_x_discrete(name="Species",
                   labels=labels_species)+
  scale_colour_manual(name = "Location",
                    labels = c("Gulf of Lion","Murcia","Faro","Gulf of Gascogne"),
                    values = brewer.pal(n = 4, name = "RdBu"))+
  ylab("Percentage of missing genotype (%)")+
  xlab("Species")+
  theme_classic()+
  scale_colour_manual(name = "Location",
                      labels = c("Gulf of Lion","Costa Calida","Algarve","Bay of Biscay"),
                      values = brewer.pal(n = 4, name = "RdBu"))+
  theme(legend.position="bottom")

ppp<-ggplotly(ppp,height=500,tooltip=c("text"))  %>% partial_bundle() 
ppp
```

### Phased genotype percentage

```{r}
ppp <- ggplot(VCF_STATS_INDIV, aes(x=SPECIES, y=PHASED_GENOTYPE_PERCENTAGE))  +
  geom_violin(fill="chartreuse3",col="black",alpha=0.25) +
  geom_point(aes(col=LOCATION,
                 text=paste("Sample: ",SAMPLES, " \n Phased genotype percentage: ",round(PHASED_GENOTYPE_PERCENTAGE,3),sep="")),size = 1,
             shape=19,
             position='jitter',
             alpha=0.75) +
  stat_summary(fun=median, geom="point", size=2, color="orange")+
  scale_x_discrete(name="Species",
                   labels=labels_species)+
  scale_colour_manual(name = "Location",
                    labels = c("Gulf of Lion","Murcia","Faro","Gulf of Gascogne"),
                    values = brewer.pal(n = 4, name = "RdBu"))+
  ylab("Phased genotype percentage (%)")+
  xlab("Species")+
  theme_classic()+
  scale_colour_manual(name = "Location",
                      labels = c("Gulf of Lion","Costa Calida","Algarve","Bay of Biscay"),
                      values = brewer.pal(n = 4, name = "RdBu"))+
  theme(legend.position="bottom")

ppp<-ggplotly(ppp,height=500,tooltip=c("text"))  %>% partial_bundle() 
ppp
```

### Phased genotype number

```{r}
ppp <- ggplot(VCF_STATS_INDIV, aes(x=SPECIES, y=PHASED_GENOTYPE_NBR))  +
  geom_violin(fill="chartreuse3",col="black",alpha=0.25) +
  geom_point(aes(col=LOCATION,
                 text=paste("Sample: ",SAMPLES, " \n Phased genotype number: ",round(PHASED_GENOTYPE_NBR,3),sep="")),size = 1,
             shape=19,
             position='jitter',
             alpha=0.75) +
  stat_summary(fun=median, geom="point", size=2, color="orange")+
  scale_x_discrete(name="Species",
                   labels=labels_species)+
  scale_colour_manual(name = "Location",
                    labels = c("Gulf of Lion","Murcia","Faro","Gulf of Gascogne"),
                    values = brewer.pal(n = 4, name = "RdBu"))+
  ylab("Phased genotype number")+
  xlab("Species")+
  theme_classic()+
  scale_colour_manual(name = "Location",
                      labels = c("Gulf of Lion","Costa Calida","Algarve","Bay of Biscay"),
                      values = brewer.pal(n = 4, name = "RdBu"))+
  theme(legend.position="bottom")

ppp<-ggplotly(ppp,height=500,tooltip=c("text"))  %>% partial_bundle() 
ppp
```

### Number of genotype

```{r}
ppp <- ggplot(VCF_STATS_INDIV, aes(x=SPECIES, y=GENOTYPE_NBR))  +
  geom_violin(fill="chartreuse3",col="black",alpha=0.25) +
  geom_point(aes(col=LOCATION,
                 text=paste("Sample: ",SAMPLES, " \n Nbr of genotypes: ",round(GENOTYPE_NBR,3),sep="")),size = 1,
             shape=19,
             position='jitter',
             alpha=0.75) +
  stat_summary(fun=median, geom="point", size=2, color="orange")+
  scale_x_discrete(name="Species",
                   labels=labels_species)+
  scale_colour_manual(name = "Location",
                    labels = c("Gulf of Lion","Murcia","Faro","Gulf of Gascogne"),
                    values = brewer.pal(n = 4, name = "RdBu"))+
  ylab("Nbr of genotypes")+
  xlab("Species")+
  theme_classic()+
  scale_colour_manual(name = "Location",
                      labels = c("Gulf of Lion","Costa Calida","Algarve","Bay of Biscay"),
                      values = brewer.pal(n = 4, name = "RdBu"))+
  theme(legend.position="bottom")

ppp<-ggplotly(ppp,height=500,tooltip=c("text"))  %>% partial_bundle() 
ppp
```

### SNPs Ts/Tv

```{r}
ppp <- ggplot(VCF_STATS_INDIV, aes(x=SPECIES, y=SNP_TS_TV))  +
  geom_violin(fill="chartreuse3",col="black",alpha=0.25) +
  geom_point(aes(col=LOCATION,
                 text=paste("Sample: ",SAMPLES, " \n Snps Ts/Tv: ",round(SNP_TS_TV,3),sep="")),size = 1,
             shape=19,
             position='jitter',
             alpha=0.75) +
  stat_summary(fun=median, geom="point", size=2, color="orange")+
  scale_x_discrete(name="Species",
                   labels=labels_species)+
  scale_colour_manual(name = "Location",
                    labels = c("Gulf of Lion","Murcia","Faro","Gulf of Gascogne"),
                    values = brewer.pal(n = 4, name = "RdBu"))+
  ylab("Snps Ts/Tv")+
  xlab("Species")+
  theme_classic()+
  scale_colour_manual(name = "Location",
                      labels = c("Gulf of Lion","Costa Calida","Algarve","Bay of Biscay"),
                      values = brewer.pal(n = 4, name = "RdBu"))+
  theme(legend.position="bottom")

ppp<-ggplotly(ppp,height=500,tooltip=c("text"))  %>% partial_bundle() 
ppp
```

### SNPs Het/Hom

```{r}
ppp <- ggplot(VCF_STATS_INDIV, aes(x=SPECIES, y=HET_HOM))  +
  geom_violin(fill="chartreuse3",col="black",alpha=0.25) +
  geom_point(aes(col=LOCATION,
                 text=paste("Sample: ",SAMPLES, " \n Snps Het/Hom: ",round(HET_HOM,3),sep="")),size = 1,
             shape=19,
             position='jitter',
             alpha=0.75) +
  stat_summary(fun=median, geom="point", size=2, color="orange")+
  scale_x_discrete(name="Species",
                   labels=labels_species)+
  scale_colour_manual(name = "Location",
                    labels = c("Gulf of Lion","Murcia","Faro","Gulf of Gascogne"),
                    values = brewer.pal(n = 4, name = "RdBu"))+
  ylab("Snps Het/Hom")+
  xlab("Species")+
  theme_classic()+
  scale_colour_manual(name = "Location",
                      labels = c("Gulf of Lion","Costa Calida","Algarve","Bay of Biscay"),
                      values = brewer.pal(n = 4, name = "RdBu"))+
  theme(legend.position="bottom")

ppp<-ggplotly(ppp,height=500,tooltip=c("text"))  %>% partial_bundle() 
ppp
```

### Depth distribution
```{r}
VCF_DEPTH_QUALITY_PER_SITE = VCF_DEPTH_QUALITY_PER_SITE[-1,]
for (i in 1:nrow(VCF_DEPTH_QUALITY_PER_SITE)){
  if (VCF_DEPTH_QUALITY_PER_SITE[i,]$FILTER != "PASS"){
    VCF_DEPTH_QUALITY_PER_SITE[i,]$FILTER = "FILTERED"
  }
}
```

```{r, fig.height=150}
ppp<-ggplot(VCF_DEPTH_QUALITY_PER_SITE[VCF_DEPTH_QUALITY_PER_SITE$DEPTH<=100,], aes(x=DEPTH,fill=FILTER)) +
  geom_histogram(aes(y=..density..), position="identity", alpha=0.5,binwidth = 1)+
  geom_density(alpha=0.6)+
  geom_vline(xintercept=20,
             linetype="dashed")+
  xlim(c(0,65))+
  scale_fill_viridis_d(begin=0.1,end=0.9)+
  labs(title="Depth",x="Depth", y = "Freq.")+
  theme_classic()+
  facet_grid(SPECIES ~ .)

ppp<-ggplotly(ppp,height=2500)  %>% partial_bundle() %>% hide_legend()
ppp
```

### Quality distribution

```{r, fig.height=150}
ppp<-ggplot(VCF_DEPTH_QUALITY_PER_SITE, aes(x=QUALITY,fill=FILTER)) +
  geom_histogram(aes(y=..density..), position="identity", alpha=0.5,binwidth = 100)+
  geom_density(alpha=0.6)+
  geom_vline(xintercept=20,
             linetype="dashed")+
  xlim(c(0,5e3))+
  scale_fill_viridis_d(begin=0.1,end=0.9)+
  labs(title="Quality",x="log[Quality]", y = "Freq.")+
  theme_classic()+
  facet_grid(SPECIES ~ .)

ppp<-ggplotly(ppp,height=2500)  %>% partial_bundle() %>% hide_legend()
ppp
```

Packages used {data-icon="fa-map"}
=======================================================================

```{r}
installed.packages()[names(sessionInfo()$otherPkgs), "Version"]
```